arXiv:1504.04935vl [stat.ME] 20 Apr 2015 


Testing the independence of two random 
vectors where only one dimension is large 

Weiming Li* 

Beijing University of Posts and Telecommunications 

and 

Jiaqi Chen^ 

Harbin Institute of Technology 
and 

Jianfeng Yao^ 

The University of Hong Kong 
April 21, 2015 


Abstract 

For testing the independence of two vectors with respective dimensions pi and p 2 , 
the existing literature in high-dimensional statistics all assume that both dimensions 
Pi and p 2 grow to infinity with the sample size. However, as evidenced in the RNA- 
sequencing data analysis discussed in the paper, it happens frequently that one of the 
dimension is quite small and the other quite large compared to the sample size. In this 
paper, we address this new asymptotic framework for the independence test. A new 
test procedure is introduced and its asymptotic normality is established when the 
vectors are normal distributed. A Mote-Carlo study demonstrates the consistency 
of the procedure and exhibits its superiority over some existing high-dimensional 
procedures. Applied to the RNA-sequencing data mentioned above, we obtain very 
convincing results on pairwise independence/dependence of gene isoform expressions 
as attested by prior knowledge established in that field. Lastly, Monte-Carlo exper¬ 
iments show that the procedure is robust against the normality assumption on the 
population vectors. 
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1 Introduction 


Modern scientific researches increasingly enconnter high dimensional data and then evoke 
corresponding statisti cal analyses . In ge nomics, next-generation seqnencing techniqnes snch 


as RNA-Seqnencing flFeng et ah 


20131) are designed to qnantify gene expression, where 


typically a gronp of gene isoforms are analyzed and their expression data at exon levels 
are recorded into mnltidimensional vectors. The dimensions of these vectors vary in a wide 
range where the smallest dimension can be one or two and the largest one can be comparable 
to the sample size (see Table 3). A fnndamental issne in snch analyses is determining 
whether there is any interaction between two given gene isoforms. More formally, this 
problem involves testing the independence of two possibly correlated vectors in a sitnation 
where one dimension is small bnt the other is large compared to the sample size. 

Generally, let X = (Xi,..., XpJ, Y = (Yi,..., Yp^) and Z = (X, Y) be the joint vector 
of dimension p := pi + p 2 - The covariance matrix of Z is partitioned as 


S = 


^xx ^xy 

y y 

^yx ^yy 


so that Y^xx = Yar(X), Yyy = Var(Y) and Y^y = Cov{X.,Y). Let zi,..., z^v be a sample 
of size N drawn from the popnlation Z. The sample covariance matrix is 

. N 


Sn = - - z)(Zfc - Z)' 

n 


k=l 


where z = ^ '^k and n = N — 1 represents the degree of freedom. Accordingly, S'„ 

can be partitioned as 




e c 
^xx ^xy 

Syx Syy 


Assume that the joint vector Z has a p-dimensional normal distribution with mean /x and 
covariance matrix Y, the independence hypotheses of X and Y can be represented as 


Ho ■ Yxy = 0 v.s. Hi : Yxy ^ 0. 


( 1 ) 


To test these hypotheses, the following three statistics are commonly used flAndersonl . 
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20031 ). which are the likelihood ratio test (LRT) and two trace criteria: 


A = 


sup A(/x, S) 




— IT O Q ^ Q ^■^12 


^-ii A 


Cl = tT{S^ySyySy^Sj) and C 2 = tr(S'^yS'j/^) - -tr(S^^)tr(S'yj^). (2) 


^xy^yy ^yx^xx 

1 
n 


The LRT statistic is the well-known Wilks’s A (Wilks, 1935 1. Both statistics Ci and C 2 


are based on the idea that under the independence hypothesis, = '^yx = 0 so that S, 


'xy 


as well as Syx should be small. A noticeable difference here is that the statistics A and 
Cl rely on the inverse matrices S~^ and S~y so that essentially the conditions Pi < n are 
required. Conversely, the criterion C 2 can be applied when the dimensions pi, i = 1,2, are 
larger than the sample size N. 

The test procedures for the classical situation where th e dimensions pds are reasonably 
small compared with the the sample size are well studied flAndersonl . l2003l) . It is however 
well understood today that these asymptotical approxi mations are no more vali d when the 


imen si ons are compa. r able t o the sample size, se e e.g. 


(2009), I Chen and Qin 


f 201ol l 


and 


Ledoit and WolfI ( 20021) . 


Bai et al, 


Wang and Yaol (120131 ). New limiting distributions have 


to be found in the large-dimensional context. 

Specifically for the independence test, the existing literature in the large-dimensional 
context includes 


1. the large-dimensional limit of A proposed in l.Tiang et al.l (120131 ) under the asymptotic 
scheme min(pi,p2, 00, pi + p 2 < n and Pi/n —>• Cj > 0; 


2. the large-dimensional limit of Ci proposed in 


Jiang et al 


(120131) under the asymptotic 


scheme min(pi,p 2 ,''^) C) 0 , max(pi,p 2 ) < n and Pi/n ^ q > 0; and 

3. the large-dimensional limit of C 2 proposed in 


Srivastava and ReidI (120121) under the 
asymptotic scheme min(pi,p2,^) C)0, Pi/p —)■ d* > 0 and n = 0{p^) for some 
constant J > 0 as n cxd. 


These existing asymptotic schemes are quite similar in that they all require that both 
dimensions pi and p 2 grow to infinity with the sample size N. 

Motivated by RNA-sequencing analysis, our objective in this paper is to test the hy¬ 
potheses in ([T]) with the criterion C 2 assuming pi fixed and {p 2 ,n) —)■ 00. As far as we know. 
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this scheme has not been addressed in the literature. It will be proved that the asymptotic 
distrib ution of the statistic e xists u nder this asymptotic scenario and is consistent with the 
one in ISrivastava and ReidI (120121) . Note that our proof is different from theirs and this 


new asymptotic scenario is not covered by their results. 

The rest of this paper is organised as follows. In the next section, we present the new 
test procedure and examine its size and power through simulation experiments. Section 
3 presents an analysis of a genomic data set and Section 4 presents some conclusions and 
remarks. The main theorem is proved in the last section. 

2 Test for the independence in high dimensions 


2.1 Test statistic and its asymptotic distribution 

The null hypothesis in ([1]) is equivalent to tY{Tixy^yx) = 0. Thus we may construct an 
unbiased estimator of this trace and reject the null hypothesis when this statistic is too 
large. Let 

72 — tr(I] ), 'Jxx — 7j/j/ ~ ~ ^^i'^xy'^yx) ■ 


We have by definition 2'jxy = l 2 — ^xx — 1yy From ISrivastaval (120051) . an unbiased estimator 


of 72 is given as fc„[tr(S'^) — tr^(S'n)/n] with kn = n^/(n — l)(n + 2). Therefore an unbiased 
estimator of jxy is constructed as 

“ tr(52j - iT{Sly) - - [tr2(5„) - tr2(5^^) - tr2(5j,y)] | , 


n 


kn 


^^(^SxySyx') ^fr(>S*a;a;)tr(>S*jyy) 

We thus get the trace criterion C 2 given in (|2]). Notice that the estimator ^xy is a function 
of eigenvalues of the sample covariance matrices Sxx, Syy, and Sn- 

Theorem 2.1. Suppose that the dimensions p = P 1 +P 2 and n both tend to infinity, and 

0 < lim -tr(S^) < 00 , k = 1,2,4. 

p^oo p 


Then under the null hypothesis in (|T|), 


n 


: = 

yj2kn y /fxxfyy 


4iv(o, 1), 


(3) 
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where = kn[tr{Sl^) - tT^{Sxx)/n] and %y = kn[tT{Syy) - tT^{Syy)/n] with kn = n^/{n - 
l)(n + 2). 

This theorem is built on a general dimensional scenario as only the assumption pi +p 2 —t 
oo is required. This scenario integrates two cases: 1) pi is fixed and only p 2 approaches 
infinity; 2) pi and p 2 both tend to infinity. Under the second case, the conclusion in ([3]) is 
essentially the same as the main theorem in ISrivastava and ReidI (120121) . This means that 
for practical applications, the proposed test is robust against different asymptotic scenarios 
of dimensions. Such robustness is especially welcomed since in a precise application (such 
as the gene isoform data analyzed in the paper) the explicit values of the dimensions pi and 
P 2 are known and it is somehow difficult to decide what is the most convenient asymptotic 
scenario to use. 


2.2 Monte-Carlo study 

We numerically evaluate the finite-sample performance of the test and report the empir¬ 
ical size and power under different dimension settings . For the purpose of comparison, we 


also consider two tests discussed in 


Jiang et ah 


(120131) : one is the corrected LRT, referred 


as Ti, and the other is based on tY{SxyS~y SyxS~^), referred as T 2 . Since the test Ti is 
limited to pi -|-p 2 < and T 2 is limited to max{pi,p 2 } < n, we only consider the former 
case when comparing the three tests. The nominal significance level is fixed at a = 0.05, 
and the number of independent replications is 100, 000. 

We first report the empirical sizes of the three tests. Samples are drawn from standard 
normal population, and thus S is an identity matrix. The dimensions are pi = 2, 6,10, 
P 2 = 10, 30,100, 200, 500, and n = 50. The results are collected in Table [H where the 
first six columns compare the sizes of the three tests when pi -|- p 2 < and the last three 
columns illustrate the size of the proposed T„ when p 2 > n. The results show that all the 
empirical sizes are close to the nominal significance level. 


To examine the powers of the three tests, we employ a model studied in 


Jiang et al. 


(j2013[) . where the populations X and Y are defined as 


X = Ui+7UfS Y = U2 + 7U2, U,~iV(0,IpJ, ^ = 1,2, 
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Table 1: Empirical sizes in percents for the three tests with the signihcant level a = 0.05. 


Pi P2 = 10 P2 = 30 Tnkp2 



T 

Ti 

T 2 

T 

Ti 

T 2 

100 

200 

500 

2 

6.32 

6.56 

5.86 

5.72 

6.17 

4.48 

5.52 

5.34 

5.30 

6 

5.89 

6.11 

5.37 

5.66 

5.88 

4.69 

5.46 

5.21 

5.29 

10 

5.74 

6.03 

5.27 

5.46 

5.90 

4.70 

5.36 

5.09 

5.16 


respectively, where Ui and U 2 are independent, is a subset of U 2 consisting of its hrst 
Pi variables, and the factor 7 represents the degree of mixture. Therefore, the covariance 
matrices are respectively 


'^xx — (1 + '^yy — (1 + lY Ip2i 


'^xy — 7(1 T l)i.^pii Opi,P 2 -pi) I 


where Om,n represents an m x n zero matrix. 


Empirical power 


Empirical power 




Figure 1: Empirical powers of the three tests. The parameter settings are {pi,p 2 -,n) = 
(4,30,50), 0 < 7 < 0.9 in the left panel, and (pi,n, 7 ) = (4,50,0.5), 5 < p 2 < 45 in the 
right panel. 


Figure [T] illustrates the powers of the three tests for this model. In the left panel, the 
parameters are (pi,P 2 , n) = (4, 30, 50) and the factor 7 increases from 0 to 0.9; while on the 
right, (pi,n, 7 ) = (4,50,0.5) and p 2 increases from 5 to 45. The curves in the hgure show 
that the powers of the tests Ti and T 2 are similar, and are dominated by the proposed test 
Tn in all the settings. Particularly, the curves in the right panel show that all the powers of 
the tests decrease as p 2 increases, which reflects the fact that in this process the increasing 


6 

















number of zero entries of Y^xy makes it closer to the zero matrix of the null hypothesis. 
However, the power of declines much slower than Ti and T 2 , which demonstrates a 
greater robustness of T„ against the inflating p 2 . 

Next we examine the robustness of the three test procedures when the assumed normal 
distribution of the vectors is contaminated by gamma-distributed errors. The studied model 
is the same as the previous except that the vector Uds are replaced by 


IJ^ + eVi, V, = (na, 


, Vipi ! 


* = 1 , 2 , 


where independent of {Uj}, are i.i.d. standardized random variables derived from 

Gamma{a, b) distributed variables and the parameter 6 represents the level of contami¬ 
nation. The new parameters are set to be a = 6 = 3 (positive skew, heavy-tailed) and 
6 = 1/2,2 in this experiment. Thus the covariance matrices become 

Yxx = (l+7^)(l + ^^)-^pi) ^yy = (1+7)^(1 + ^^)-^p2) ^xy = 7(1+7)(1 + ^^)(4'i) ^pi,P2-Pi)- 


Results about the empirical sizes and powers of the tests are collected in Table [2] and 
Figure [21 respectively. It shows that all the sizes are close to the nominal one and the 
power curves are quite similar to those in Figured! which demonstrate that the additional 
gamma-distributed errors have little impact on the three tests. It is however worth noticing 


that the theoretic proof of Theorem 12.11 in this pape r as well as t 


normality of the test criteria Ti and T 2 established in iJiang et ah 


le pro ofs for asymptotic 


2013! ) all heavily rely on 


the assumed normality of the vectors, and to our best knowledge, it seems unclear how these 
proofs can be extended to cover non-normal data as the ones tested in the Monte-Carlo 
experiments reported here. 


3 Real data analysis 


Genomes play a central role in the control of cellular processes (IBarabasi and Oltvail . 120041 ). 


The dynamic interplay between various genes can be mapped as gene co-expression net¬ 
works, which is an important and widely used method to understand the cause and progno¬ 
sis of various diseases. To recover pairwise dependencies in a gene co-expression network. 
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Table 2: Empirical sizes in percents for the three tests with the signihcant level a = 0.05. 




Pi 


P2 = 10 



P2 = 30 



Tn^P2 





T 

Ti 

T 2 

T 

Ti 

T 2 

100 

200 

500 



2 

6.38 

6.56 

5.90 

5.86 

6.19 

4.45 

5.55 

5.32 

5.22 

e = 

1 

2 

6 

5.91 

6.17 

5.44 

5.47 

5.84 

4.60 

5.34 

5.24 

5.16 



10 

5.71 

5.94 

5.18 

5.55 

5.80 

4.80 

5.33 

5.12 

5.22 



2 

6.38 

6.52 

5.80 

6.00 

6.38 

4.62 

5.59 

5.59 

5.33 

e = 

= 2 

6 

6.02 

6.15 

5.49 

5.65 

5.79 

4.67 

5.42 

5.42 

5.22 



10 

5.85 

6.03 

5.27 

5.68 

5.82 

4.84 

5.35 

5.33 

5.06 


each co-expression edge has to be inferred by accepting or rejecting the independence hy¬ 
pothesis from the sample covariance matrix of respective isoform expressions. 

We analyze a data set of liver cancer, which is downloaded from TCGA data por¬ 
tal: https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm, and filtered by data types 
RNASeqV2 and Level 3. The data set consists of 38 genes with their dimensions ranging 
from 1 to 31 (see Table [3]) and their sample size is A^ = 50. Obviously the dimensions are 
not on the same order of magnitude as their sample size. For these genes, the relationship of 
dependency are totally known based on established knowledge from historical experiments: 
29 pairs of them are dependent and the remaining 674 pairs are independent. 

We test the pairwise gene depend encies using T„ an d compare the results with those 


from two other methods: one is from 


Hong et al.l (120131 ). which is a variant of traditional 


canonical correlation analysi s (CCA); the other i s the 


which is recently applied in 


Yalamanchili et ah 


arge-dimensional trace criterion T 2 , 
(120141) and is demonstrated better than 


CCA. The corrected LRT Ti is excluded from this comparison since its dimensional re¬ 
quirement is not met for the data set. The significance level is set to be a = 0.05. To 
evaluate the accuracy of the test results, we employ the so called F-score (jPowersl . l2007l ) 
which actually measures the trade-off between precision P and recall R\ 

Px R 


F = 2x 


P + R' 


(4) 























Empirical power,0=1/2 


Empirical power,0=1/2 





Figure 2: Empirical powers of the three tests for the non-normal distribution with 9 = 
1/2,2. The parameter settings are {pi,p 2 ,n) = (4,30,50), 0 < 7 < 0.9 in the left panel, 
and {pi,n, 7 ) = (4, 50, 0.5), 5 < p 2 < 45 in the right panel. 

where 

^ true positives ^ true positives 

true positives + false positives^ true positives + false negatives 

With the prior information of dependency, the true positives stands for the number of 
correctly identified correlated pairs of genes, the false positive is the number of misidentified 
correlated pairs of genes, and the false negatives is the number of misidentified uncorrelated 
pairs of genes. 

The F-scores reported in Table H] show that Tn outperforms T 2 significantly. CCA fails 
to detect the relationship between gene NM002228 and other genes due to the dimension 
of this gene is 1. The same phenomenon happens to gene NM005195. Therefore, we cannot 
get F-score for CCA. 

Next, we remove the 1-dimensional genes from the data set in order to incorporate CCA 
for comparison. The remaining 36 genes include 25 dependent pairs and 605 independent 
pairs. The F-scores collected in Table [5] demonstrate that T„ again outperforms the others. 
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Table 3: Lung 

cancer data: 

38 genes 

with different dimensions 


Name 

Dimension 

NM000222 

20 

NM000321 

27 

NM000636 

4 

NM000791 

6 

NM001126116 

7 

NM001140 

13 

Name 

NM001145102 NM001204191 

NM001237 

NM001429 

NM001759 

NM001760 

Dimension 

9 

7 

8 

31 

5 

4 

Name 

NM001786 

NM001880 

NM001950 

NM002198 

NM002228 

NM002421 

Dimension 

4 

13 

10 

9 

1 

10 

Name 

NM002467 

NM002505 

NM002539 

NM002985 

NM003109 

NM003153 

Dimension 

3 

10 

11 

3 

6 

22 

Name 

NM003221 

NM003998 

NM004379 

NM004417 

NM005194 

NM005238 

Dimension 

7 

23 

8 

2 

1 

8 

Name 

NM005239 

NM005252 

NM005438 

NM007122 

NM022457 

NM033285 

Dimension 

10 

2 

4 

11 

20 

4 

Name 

Dimension 

NM053056 

5 

NM198253 

15 






Table 4: F-scores for the data set including 38 genes. 
Method Tn Ta CCA 
F-score 0.64 0.40 NA 


Notice that such results on pairwise dependence of gene isoform exp r ession s are further 


used to construct gene co-expression networks, see 


Yalamanchili et ah 


fl2014h . 


Table 5: F-scores for the data set including 36 genes. 
Method Tn Ta CCA 
F-score 0.6465 0.4238 0.4187 


4 Concluding remarks 

This paper investigates the independence test of two vectors in a high-dimensional situation 
where one of the dimensions pi is quite small while the other dimension pa is large compared 
to the sample size. The asymptotic scheme is novel and practically useful. A new procedure 
is introduced and the test statistic under the null is proved to be asymptotically normal 
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distributed assuming that pi + P2 — t cxd and the vectors are normal distributed. The 
power of the proposed test is studied through Monte-Carlo simulations and a real data 
analysis, which demonstrates the superiority of the new test over the existing ones. Another 
interesting feature found in the Monte-Carlo study is that the proposed procedure is robust 
against deviations from the normality assumption on the vectors although a theoretic proof 
of this fact is still missing. 

5 Proofs 

5.1 Lemma 

Lemma 5.1. Let u, v, and ■w be independent vectors of n-dimensional standard normal 
distribution N{0,ln), and define 

^(x,y) = -(x'y)^ - ^(x'x)(y'y), (5) 

n 

then 

2 

E[^/>(u, v)|u] = 0, E[^/>(u, v)^/>(w,v)|u, w] = -^/;(u,w), 

E[-^(v, v)] = (n — l)(n -I- 2)/n, E['^^(u, v)] = 2(n — l)(n -|- 2)/n^, 

E[^^(v, v)] = 0{n^), Var[^/>2(v, v)] = 0(n), E[^/>^(u, v)] = 0(1), 


as n ^ oo. 


Proof. The distribution of v'v is x^(n) and the conditional distribution of u'v|u is A^(0, u'u), 
thus E['^(u, v)|u] = 0. Write 


^(u, v)^/>(w,v) 


4(uV)2(w'v) 2 - i-(u'v)2(w'w)(v'v) 

-r (w'v)^ (u'u) (v'v) H- t(u'u) (’w'w) (v'v)^ 




r^l - ^^2 - ^^3 + ^^4 


n-’ 


n"’ 
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Then E(S' 4 |u, w) = n{n + 2)(u'u)(w'w), and 


E(S'i|u, w) 


E(S' 2 |u, w) 


UiUjWkWiY.{ViVjVkVi) 


UiUjWkWi + UiUjWkWi + UiUjWkWi 

i=j,k=l i=k,j=l i=l,j=k 

(u'u)(w'w) + 2(u'w)^, 


(w'w) . E{v^vl) = {n + 2)(u'u)(w'w), 

i^k 


and thus E(S' 3 |u, w) = E(S' 2 |u, w), where (a;*) denote the elements of x. Collecting these 
results, we get E[-^(u, v)'^(w, v)|u, w] = (2/n)'^(u, w). 

Notice that '0(v, v) = (n — l)(v'v)^/n^, and E(v'v)^ = ?7,(?7, + 2) ■ • • (?7, + 2A; — 2), k G N"*". 
We have then, 


E['0(v,v)] = (n - 1)(?7. + 2)/n, 

E['0^(u, v)] = (2/?7.)E['0 (u, u)] = 2(n - 1)(?7. + 2)/n^, 

E['0^(v,v)] = E(v'v)^(n — = O(n^), 

Var['0^(v, v)] = [E(v'v)"‘ — E^(v'v)^] {n — /n^ = 0{n). 

Finally, from Minkowski inequality, 

ElV-^Kv)] = TE|(uV)2-(u'u)(vV)/,y 

< T I[E(u'v)®]i + [E(u'u)'‘E(v'v)*]i/n}" 

= T|[E(v'v)-]i+[E(v'v)-]h«}‘. 

which is 0(1) as n —)■ cxo. □ 

5.2 Proof of Theorem 12.1 

The sample covariance S'„ has the Wishart distribution 1T„(S) with n degrees of freedom. 
It can be expressed as Yll=i^ki'k/''^ where (z*) are i.i.d. A^(0,S). Write Zj = (x',y')' = 
(xii,..., , Vip^y, i = 1,... n, and denote X = (xi,..., x„) and Y = (yi,..., y^). 

Note that the matrices X and Y contain normal vectors which are independent under Hq. 
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The matrices X'X and Y'Y can be standardized as 

Pi P2 

X'X = a,u,u', Y'Y = /3,v,v', 

i=l j=l 

where (oi) and {(3j) are the eigenvalues of and T,yy, respectively, and (uj), (vj) are i.i.d. 
N{ 0 ,ln)- Therefore, we have 


n 


'Ixy '^^^(^SxySyx') tr (5*2,2, )tr(>S'yj^) 


h 


= -tr(X'XY'Y) - ^tr(X'X)tr(Y'Y) 


^ - i-(u'u2)(v'v 

n 


n 

pi P2 

i=l j=l 
Pi P2 

i=l j=l 




where aij = ai(3j and ^|J is dehned in (jS]) with the dimension n, i = 1,... ,pi, j = 1,... ,p 2 - 
We use the martingale CLT to establish the limiting distribution of T„. Without loss 
of generality suppose that pi < p 2 , and dehne 0^"'^ = (1 /-y/Pi^) Vj), j = 

1,... ,P 2 - Let be the cx-algebra generated by the random variables {ui,..., Up^, vi,..., v^}, 
then {0, f2} = J5) C C • ■ ■ C ^ C T with (f2, P) the probability space. From 
Lemma 15.11 and the law of iterated expectations. 


E 


dn) 


E 


j-(n) 


^(n) 


Pl 


v™ tr 


^ayE(V’(U(, v,)|Ui) = 0, 


Pl Pl 


P 1 P 2 


aijakjE[ilj{ui, Vj)^jJ{uk, v^)] 


i=l k=l 


2{n-l){n + 2) ^ ^ 


n^PiP2 


2=1 


which is 0(l/p2) as (p, n) —)■ 00 . Thus forms a sequence of integrable martin- 
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gale differences. On the other hand, 




j=i 


-p{n) 


1 

P 1 P 2 

2 


P2 Pl Pl 


npip2 


EEE a^afcjE vj)'0(ufc, Vj)|ui, Ufc) 

_7 = 1 ^=1 k=\ 

Pl Pl 

EE 6y'0(ui, Ufc) 


i=l fc=l 


2 2 
- > Ui) H-> Ufc) 

npiP2 ^ npiP2 ^ 

2=1 i^k 

Ain + ^2n, 


where 6*^ = Yl^=i^ij^kj, hk = Considering the variances of Ai^ and A 2 m 

Var(2li„) = 0(l/n) and 

Var(242n) = 2 2 , sEE hikhisE[ip{ui, Ufc)V’(uz, u^)] 

g _ 

n P 1 P 2 

which is 0(l/n^). Therefore, from the Chebyshev ineqnality. 


Ee 


i=i 


-p{n) 


/ \2 

{P,n)^oo, 

i=i 


where the second expectation has expression := 2(1 —l/n)(l+2/n) (PiP2)- 

Next we verify Lyapnnov condition by showing that Bn = 0- From 

Lemma 15.11 and the law of iterated expectations. 


B 


n 


^ P2 pi pi pl pl 

iEEEEE aijaijasjatjE[ip{ui, Vj)V’(ui, Yj)ip{us, Vj)V’(ut, v^)] 

P1P2 


1 

pjpl 


P2 Pl 


j=i i=i 


o P2 

mJ2Y1 V,)] 

^1^2 ,=1 


1 

p2p2 


P2 Pl 

j=l i=l 


1 q ^^2 

ala%E[ip\vj, V,)], 


which is 0(l/p2) as (p, n) —)■ cxd. 

Notice that 72,3; and %y are nnbiased and consistent estimators of 73,3, and ■jyy, re¬ 
spectively. The statistic := 2(1 — l/n)(l -|- 2 / n)^xxlyy / {P 1 P 2 ) is also an nnbiased and 
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consistent estimator of under the null hypothesis, therefore 


^ 'yxy 
•\/‘2kn -sj'^xx^yy 


P2 


(n) 

i=i 


AiV(0,l), 


as (p, n) —)■ oo. 
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